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Even during periods of fixation our eyes undergo small amplitude movements. These 
movements are thought to be essential to the visual system because neural responses 
rapidly fade when images are stabilized on the retina. The considerable recent interest 
in fixational eye movements (FEMs) has thus far concentrated on idealized experimental 
conditions with artificial stimuli and restrained head movements, which are not necessarily 
a suitable model for natural vision. Natural dynamic stimuli, such as movies, offer the 
potential to move beyond restrictive experimental settings to probe the visual system 
with greater ecological validity. Here, we study FEMs recorded in humans during the 
unconstrained viewing of a dynamic and realistic visual environment, revealing that drift 
trajectories exhibit the properties of a random walk with memory. Drifts are correlated at 
short time scales such that the gaze position diverges from the initial fixation more quickly 
than would be expected for an uncorrelated random walk. We propose a simple model 
based on the premise that the eye tends to avoid retracing its recent steps to prevent 
photoreceptor adaptation. The model reproduces key features of the observed dynamics 
and enables estimation of parameters from data. Our findings show that FEM correlations 
thought to prevent perceptual fading exist even in highly dynamic real-world conditions. 
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1. INTRODUCTION 

Fixational eye movements (FEMs) have seen considerable recent 
interest for their roles in perception and oculomotor con- 
trol (Martinez-Conde et al., 2004; Engbert, 2006; Rolfs, 2009; 
Martinez-Conde et al, 2013). Their cause and functional signif- 
icance remain poorly understood, but they appear to be much 
more than a noisy inconvenience — neural responses fade rapidly 
in their absence (Coppola and Purves, 1996). 

Eye movements are broadly categorized into fixations (where 
the eyes are relatively still), pursuits (eyes smoothly tracking a 
moving target), and saccades (rapid movements between fixa- 
tions). Fixations and pursuits are particularly important as it is 
during these periods that visual information is extracted from the 
world. Three types of FEM perturb this extraction: drift, tremor, 
and microsaccades (Martinez-Conde et al., 2009; Rolfs, 2009). 
Drifts are slow eye movements that follow apparently random tra- 
jectories and carry retinal images across several photoreceptors 
during typical fixations. Tremor has a broadband frequency spec- 
trum (30-120 Hz, peak ~80 Hz), slightly perturbing drifts by < 1 
photoreceptor width (McCamy et al, 2013). Microsaccades are 
similar to regular saccades but have smaller amplitude similar to 
the displacements caused by drifts (Rolfs, 2009). 

Fixational eye movements have been widely assumed to be 
a random uncorrelated process similar to Brownian motion 
(Pitkow et al, 2007; Burak et al, 2010; Kuang et al, 2012). Recent 
studies have shown that drift trajectories during fixation tasks 



contain non-trivial temporal correlations (Engbert and Kliegl, 
2004; Mergenthaler and Engbert, 2007). On short time scales, 
correlations cause gaze to wander faster than expected for a 
normal diffusive process. Such processes are termed superdif- 
fusive (Metzler and Klafter, 2000) or persistent (Codling et al, 
2008). On longer time scales, trajectories are anticorrelated (or 
antipersistent), tending to wander slower than normal diffusion, 
consistent with a subdiffusive process (Engbert and Kliegl, 2004; 
Mergenthaler and Engbert, 2007). 

Short-time superdiffusion is thought to refresh retinal images, 
with longer-time subdiffusion keeping gaze near fixation targets 
(Engbert and Kliegl, 2004). However, it is unknown whether 
these correlations exist in natural vision — superdiffusion may 
be unnecessary in rich dynamic environments, and subdiffusion 
could be an artifact of lengthy forced fixation. Natural scenes in 
films are inherently dynamic and arguably a better model of nat- 
ural environments than typical fixation-task stimuli, and thus are 
an increasingly important paradigm in neuroscience (Felsen and 
Dan, 2005). Well-directed films are also particularly engaging for 
viewers (Hasson et al., 2004, 2010), sustaining attention during 
lengthy data acquisitions. Drifts and microsaccades have not (to 
our knowledge) been previously studied together during dynamic 
natural scene viewing. 

Here, we characterize FEMs in natural vision by recording 
eye movements during film viewing. We show that gaze trajecto- 
ries are well-described by a correlated random walk, with scaling 
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properties similar to those in forced fixations. Moreover, we pro- 
pose a model that generates the observed short-time correlations 
via an imprecise memory of recently visited gaze locations, and 
show that it captures the superdiffusive nature of FEMs at short 
time scales (< 100 ms). Finally, we show as a proof of princi- 
ple that the model can be inverted to estimate its parameters 
from data. 

2. MATERIALS AND METHODS 
2.1. DATA ACQUISITION 

Eight healthy subjects (3 female, mean age 25.9 years, range 
22-28) viewed the Alfred Hitchcock (1948) film Rope (dura- 
tion 77 min) on an LCD monitor, with the audio stream played 
through headphones. Rope is notable for having only ten direc- 
tor's cuts (five of which are masked) and is thus an almost 
completely continuous audiovisual stream. Subjects provided 
informed consent and the study protocols were approved by 
ethics boards of the Queensland Institute of Medical Research and 
the University of Queensland in accordance with the Declaration 
of Helsinki. Analysis was restricted to the 75 min segment begin- 
ning at 1 min 53 s spanning the time between the opening and 
closing credits. The video was presented in the center of the 
screen (surrounded by a black background), subtending hori- 
zontal and vertical angles of 20° and 16°, respectively (screen 
resolution ~35 px/degree). Eye movements were recorded with an 
SR Research EyeLink II eye tracker that allowed the head to move 
freely, sampling at fs = 500 Hz using infrared cameras to record 
pupil position and correct for head motion. Subjects optionally 
took breaks every 15 min, with re-calibration on resuming the 
recording. 

Fixations were identified as the intervals between saccades, 
which were detected by the eye tracker using velocity and accel- 
eration thresholding. A saccade was deemed to be in progress 
if eye-movement velocity exceeded 30°/s and accelerations were 
in excess of 8000°/ s^. A saccade was said to start if these cri- 
teria were met for over more than two sampling periods, and 
continued as long as the criteria were met again within the next 
20 ms. The eye tracker detected blinks as corresponding to a loss 
of pupil visibility, which is accompanied by spurious saccades 
immediately before and after the detected blink; both these sac- 
cades and the following 50 ms of gaze samples were discarded 
as part of the one blink. Blinks closer than 100 ms were merged 
and those gaze points in between discarded. Saccade overshoots 
and saccade onsets were deleted from the fixations by discarding 
6 ms at the start and 4 ms at the end of each fixation, respec- 
tively. Both eyes were recorded, but only the left eye's data was 
used in all subsequent analysis; results for the right eye were 
similar. 

The PDFs of step lengths and turn angles were estimated from 
histograms of these quantities aggregated across all fixations in 
each subject. Because the eye tracker reports gaze coordinates to 
the nearest 0.1 px, histograms of the raw steps are sharply dis- 
cretized, particularly for the turn angles. Thus we added small 
amplitude noise to all gaze points uniformly distributed between 
— 0.05 px and 0.05 px to smear the discretization imposed by finite 
measurement precision. This only affects the step length distri- 
bution for very short steps of <0.1px so this perturbation is 



negligible for most steps but sufficient to yield smooth turn angle 
PDFs. 

3. RESULTS 

Table 1 summarizes fixation counts and durations for each sub- 
ject. Note that Subject I's recording was limited to the first 50 min 
of the film. Subjects made 1.7-2.7 fixations per second, result- 
ing in fixation durations (mean across all subjects 424 ms, median 
290 ms) an order of magnitude or more shorter than used in typ- 
ical fixation tasks (Engbert and Kliegl, 2004; Mergenthaler and 
Engbert, 2007). 

3.1. PROPERTIES OF FIXATIONAL EYE MOVEMENTS 

Motivated by the apparently random gaze trajectories observed 
during fixations, we analyze each trajectory as a random walk 
composed of a series of steps joining the measured gaze coordi- 
nates (Figure lA). The steps are equally spaced in time because 
of the digital acquisition, each described by its length (which 
is proportional to the instantaneous velocity) and turn angle 6 
measured relative to the previous step. The steps can thus be 
treated as vectors (see inset). Step distributions (Figures 1B,C) 
reveal anisotropy in the random walk trajectories. Forward steps 
(9 ~ 0) are both more numerous and longer than those in other 
directions, as indicated by the central triangular zone in the 
single-subject step distribution shown in Figure IB. Thus the 
eye tends to maintain its direction during drifts, consistent with 
short-time persistence in the trajectories (Engbert and Kliegl, 
2004). This does not rule out backward steps; on the contrary, 
drifts in the reverse direction are more probable than left or right 
turns (Figure IC). 

Step lengths are unimodaUy distributed (Figure ID), approxi- 
mately following a lognormal distribution (Figure 2). This long- 
tailed distribution implies that long steps are more frequent 
than expected for a normally-distributed variable, but there is 
no evidence for a second mode of outliers that would suggest 
the presence of microsaccades (Engbert and Mergenthaler, 2006). 
Such a mode would be expected if a significant number of fixa- 
tions were punctuated by a large number of flighty gaze paths an 
order of magnitude faster than the drifts. 

As a more sensitive test for microsaccades we apply the detec- 
tion method of Engbert and Mergenthaler (2006), restricting 
attention to binocular microsaccades and discarding as false pos- 
itive saccade overshoots any within a 30 ms interval centered 



Table 1 | Summary of subject fixation statistics. 



Subject 


N 


Mean dur (ms) 


Hi 


^peak 


Atpeak (ms) 


1 


4848* 


553 


1.53 


1.66 


6 


2 


10230 


404 


1.34 


1.61 


14 


3 


11932 


334 


1.50 


1.58 


6 


4 


10106 


388 


1.36 


1.52 


14 


5 


8870 


461 


1.42 


1.51 


10 


6 


9009 


421 


1.37 


1.53 


14 


7 


9219 


415 


1.37 


1.44 


14 


8 


7896 


524 


1.36 


1.40 


6 



'Denotes recording aborted after 50 min. 
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FIGURE 1 I Fixational eye movements and distributions of steps. (A) 

Example fixation gaze trajectory, with inset showing four steps as vectors 
joining gaze points. (B) Joint step length and turn angle distribution for one 



subject. (C) Turn angle distributions for eight subjects. (D) Step length 
distributions (log transformed) for eight subjects. Subjects 1-8 colored in 
order red, yellow, light green, green, cyan, light blue, blue, purple. 



on (normal) saccades, similar to a study of free-viewing data 
(Mergenthaler and Engbert, 2010). The method protects against 
noise by comparing the detected microsaccade rate to that of 
amplitude-adjusted Fourier-transformed surrogate data that pre- 
serves velocity distributions — the excess microsaccade rate in the 
data over the surrogates thus estimates the true microsaccade rate. 
Using this more sensitive method, we find 0.77 microsaccades per 
second (range 0.5-1.1 s^^) averaged across all subjects, slightly 
lower but broadly consistent with the rate 1.0±0.4s^^ reported 
during free-viewing of static natural stimuli (Mergenthaler and 
Engbert, 2010). Given typical fixation lengths of ~400ms, a rate 
below 1 s^' implies that a majority of fixations do not contain 
microsaccades. 

Moving beyond the statistics of individual steps, correlations 
between steps modify the rate at which gaze moves away from an 
initial point. The mean square displacement (Ax^) after a time 
interval Af scales as (Ax^) oc At^ with scaling exponent H = I 
for normal diffusion, H > 1 for superdiffusion or a persistent 
random walk, and H < 1 for subdiffusion or an antipersistent 
random walk. The estimator for ( Ax^> is given by 

^ N—m 

D\m) = - V ||x,+ „-x,f, (1) 

N — m '-^ 

j = 1 



where m indexes time within fixation trajectories x, = (x,, y,) (so 
in physical units Af = m/Zj) and N is the number of points in the 
time series (Engbert and Kliegl, 2004). 

For all subjects, the mean square displacement (Figure 3A) 
grows more quickly at short time scales than expected for uncor- 
rected dynamics, with ~ Af^-^ consistent with superdiffusive 
dynamics, before tapering off at long times. This behavior is con- 
sistent across all subjects, although there is some variation in the 
precise Af dependence. To quantify this, we calculate the scaling 
exponent H given by the log-log slope of vs. Af, shown for 
all subjects in Figure 3B. Common to all subjects, the initial scal- 
ing exponent Hi (the log-log slope between Af = 2 ms and 
Af = 4ms) is in the superdiffusive regime with Hi = 1.41 ± 0.07 
(mean ± SD). The H curves all remain in the superdiffusive 
regime for Af < 30 ms, rising to apeakof Hpeak = 1-53 ± 0.09 at 
corresponding time lag Afpeak = 1 1 ± 4 ms. The peak is particu- 
larly distinct in seven subjects (1-7), while the remaining subject 
(8, purple) exhibits a flatter H with a small peak at At = 6 ms. 
The three parameters Hi, Hpeak> and Afpeak are thus a useful 
characterization of the short-time scaling behavior. Single-subject 
values are given in Table 1. 

At longer time scales, the H curves dip to a minimum at Af ~ 
60-80 ms in six subjects (1,2,4-7), with two subjects's exponents 
dipping into the H < 1 subdiffusive regime. Five subjects exhibit 
a shallower local minimum H, remaining in the superdiffusive 
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FIGURE 2 I Step length upper cumulative distribution functions and 
fits for all subjects. Data (black) and fits to lognormal (blue), Weibull 
(yellow), power law with exponential cutoff (green), and exponential (red) 
distributions. (A-H) Subjects 1-8, respectively. Fits are maximum likelihood 
estimates to the tail above step lengths of 0.1 px using the methods of 
Clauset et al. (2009). 



regime. At longer time scales again, all the H curves increase again 
to peak at Af ~ 100-200 ms, and become diffusive or subdiffu- 
sive on time scales of Af > 400 ms on the order of the entire 
fixation length. Subject 7's data (blue) is in the superdiffusive 
regime even at these long time scales, and exhibits oscillations. 

Our analysis of mean squared displacement is motivated by 
previous studies (Engbert and Kliegl, 2004; Mergenthaler and 
Engbert, 2007) and has the benefit that the results are immediately 
relatable to gaze position. Equivalently we could have analyzed 
spectra in the Fourier domain — scaling exponents and spectral 
exponents are closely related (Mandelbrot and Van Ness, 1968). 
Power spectra of eye position during fixation are known to reveal 
approximately 1//^ spectra at low frequencies (Eizenman et al, 
1985; Kuang et al, 2012), and deviations from this functional 



form indicate the presence of processes other than Brownian 
motion, and indeed this is observed at high frequencies (i.e., short 
time scales), consistent with our time-domain findings. 

It is important to test whether the observed correlations can be 
explained solely by the observed anisotropy in the turn angle dis- 
tribution. A purely uncorrected 2-D random walk would have 
uniform turn angle distribution, whereas anisotropy imposes 
short-time correlations even when the individual steps are drawn 
independently. Thus we construct a surrogate walk with steps ran- 
domly chosen from the empirical distribution in Figure ID and 
estimate the scaling exponent as above. The resulting surrogate 
walk is initially weakly persistent with H = 1.13 (Figure 3C). This 
exponent decays to H < 1.05 for Af > 10 ms, consistent with 
normal diffusion and strongly inconsistent with the correspond- 
ing scaling behavior observed empirically (Figure 3B). This can 
also be shown by shuffling the order of the step vectors and recon- 
stituting them to generate a surrogate walk that preserves step 
distribution by construction (Engbert and Kliegl, 2004). Thus the 
random walk must be correlated beyond the extent imposed by 
anisotropy in its movements. 

It is also conceivable that drift correlations could arise from 
muscle tension remaining after a saccade. To test this we cal- 
culate H using only the middle 50 ms of each fixation, thereby 
keeping well away from nearby saccades by restricting the anal- 
ysis to a central time window of each individual FEM epoch. 
Scaling exponents restricted in this way are broadly similar to 
those using the original wider range (Figure 4). In particular 
these data on the middle 50 ms of each fixation show the same 
basic trend of H beginning in the superdiffusive regime, increas- 
ing over short time scales, then falling over medium time scales. 
Note that the rightmost points on the red curves are averages 
over fewer observations than the corresponding points on the 
blue curves (hence subject 8's sharp peak is likely a spurious 
effect of limited observations). Interestingly, in all cases the mid- 
dle 50 ms is more superdiffusive than the uncut fixations — the 
superdiffusive behavior is thus not an artifact of recent/imminent 
saccades. 

Another potential contribution to drifts measured using 
video-based eye trackers is cross-talk between pupil size and posi- 
tion following sudden luminance changes (Kimmel et al, 2012). 
To quantify the magnitude of pupU size fluctuations, we calculate 
the coefficient of variation (SD/mean) of pupil size across each 
fixation. The distribution of pupU size varied only slightly within 
each fixation (Figure 5A), with a median within-fixation change 
of 0.6% across all subjects. This is consistent with our film stimuli 
having relatively slowly-varying luminance. Pupil size fluctua- 
tions are thus unlikely to be a significant factor in determining 
the drift statistics. 

Beyond turn angles, we can also characterize drifts by their 
overall directions relative to preceding and following saccades. 
Figure 5B shows the distribution of fixation angles (directions 
of vectors joining first and last points) relative to the pre- 
ceding saccade angles (blue) and to the next saccade angles 
(red), pooling across all subjects. The main feature is that the 
drift direction distributions are relatively uniform, such that 
drifts are not strongly driven by the previous and next sac- 
cades. As a weaker effect, small peaks in the distributions suggest 
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FIGURE 3 I Mean squared displacement and scaling exponent for all 
subjects. (A) D^, with a reference line (black) with slope H = 1.5. (B) Scaling 
exponent H for mean square displacement given by the gradients of the 



curves. Subjects 1-8 colored in order red, yellow, light green, green, cyan, light 
blue, blue, purple. (C) Scaling exponent for a surrogate walk with uncorrelated 
steps drawn from the empirical anisotropic step distribution in Figure 1C. 
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At (ms) 



10 10 
At (ms) 



FIGURE 4 I Scaling exponents for all subjects when restricted to 
middle 50 ms of each fixation. (A-H) Subjects 1-8, respectively, showing 
scaling exponent H data (blue, same as Figure 3A) and for the middle 
50 ms of each fixation (red). 



small biases in the directions parallel (A6 = 0) and antiparal- 
lel (A9 = ±Tt) to the neighboring saccades. Drifts are slightly 
biased against the previous saccade direction, consistent with the 
existence of slow corrective motions following saccades (Weber 




10 10 10 
pupil area CV 



n/2 0 n/2 
AG (rad) 



FIGURE 5 I Pupil area fluctuations and relationships between fixation 
and saccade directions. (A) Distribution of pupil area coefficient of 
variability (CV; SD/mean) calculated for each fixation, data pooled across all 
subjects. (B) Difference AQ between fixation drift angle (direction of vector 
joining fixation start to fixation end) and previous saccade angle (blue), and 
between fixation angle and next saccade angle (red). 



and Daroff, 1972). Drifts are slightly biased toward the next sac- 
cade direction, possibly reflecting "stick-slip" trajectories during 
pursuits. 

Drift directions measured relative to the screen reveal 
anisotropies (Figure 6), with drifts tending to foUow the hori- 
zontal and vertical axes (6 = 0, ±Tt/2, ±Tt). This is similar to 
the well-known cardinal bias in saccade directions. Here, not 
all axes are favored in all individuals, suggesting that the bias is 
idiosyncratic rather than driven directly by the film. 

3.2. MODEL OF FIXATION DRIFTS 

The correlations dominating the observed short-time drift 
dynamics imply a memory of past gaze locations. In fact the 
scaling exponent H lies close to the exact exponent oi H = 1.5 
for a 2-D self-avoiding random walk (Madras and Slade, 1993). 
This is an idealized random walk on a lattice that is con- 
strained to avoid all previously visited points. Self-avoiding walks 
have been widely used to model the physical configurations 
of polymers, where self-avoidance arises from the requirement 
that no two atoms occupy the same location. It is intuitively 
appealing to apply the self-avoiding random walk to FEMs 
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because this mechanism would straightforwardly prevent adap- 
tation to inputs. However, a problem with this approach is that 
the eye cannot be perfectly self avoiding; if it were it would 
rapidly become trapped with no possible steps remaining. The 



lattice restriction is also unrealistic given that the eye can turn 
through any angle, rather than just a small number of discrete 
choices. 

Here we construct an approximately-self-avoiding random 
walk that overcomes these limitations. The model's key ingredi- 
ent is a brief imprecise memory of recently visited points, which 
biases the step turn angles to avoid retracing this history. The bias 
is imposed by choosing each turn angle from a continuous dis- 
tribution weighted by the density of recent gaze history in each 
direction. Thus the model tends to avoid directions that have been 
well traversed, but not as strictly as the idealized self-avoiding 
walk. Modified self-avoiding walks on the lattice that do not 
trap themselves have also been developed for modeling polymer 
growth (Amit et al, 1983; Kremer and Lyidema, 1985). Our sim- 
ple model is illustrated in Figure 7. An example trajectory is 
shown in Figure 7A. Past locations are remembered imprecisely, 
with each recent gaze point Gaussian-blurred in space, consis- 
tent with the fact that the visual system has only finite spatial 
resolution and is noise-limited at the smallest scales. Figure 7B 
shows the corresponding angular density n(Q) of the history as 
"viewed" from the current gaze location. Directions with high 
n(6) are penalized relative to directions that have not been vis- 
ited recently, yielding the probability p (6) for the next turn angle, 
shown in Figure 7C. 

To describe the model in more detail, let = (xj, yj) be a 
walk with steps indexed by j, and let ym be the history that 
influences the dynamics (a subset of Xj, or possibly the whole 
trajectory thus far). We seek the probability p(Q) that the next 
step Xj_|-,- lies in the direction 6. For now we restrict our atten- 
tion to walks with unit steps, but this could be relaxed to choose 
step lengths from the conditional distributions in Figure 1 (alter- 
natively the step lengths could emerge from a more detailed 
model). 

Assume that p{Q) oc h[n(Q)], where n(9) is the number 
density of points in the direction 6, and h{n) is a func- 
tion that penalizes directions with large n. We choose h{n) = 
exp(— an), where a parameterizes the strength of the penalty 
(large a implies a strong penalty). An exponential penalty 
has also been used in a self-avoiding walk confined to a lat- 
tice (Amit et al., 1983). The important point is that h{n) 
is a decreasing function of «(9), so that directions in which 
n(6) is high will have a correspondingly small probability of 
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FIGURE 6 I Fixation drift directions relative to the screen. (A-H) 

Subjects 1-8, respectively, showing distributions of fixation drift angles 9, 
where 9 = 0, ±it denotes horizontal drifts and 9 = ±ii/2 denotes vertical 
drifts. 




FIGURE 7 I Model mechanism. (A) Example fixational gaze trajectory with imprecise history representation shaded. (B) History angular density n(9) as 
measured from the current point. (C) Resulting angular distribution p{Q) for the next step after penalizing the history. 
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being chosen for the next step. Hence the random walk will 
tend to avoid its history and move to relatively unexplored 
regions. 

To calculate «(6) we need to count the number of history 
points in each direction. For a continuous turn angle distribution 
(A6 0), a perfect history represented as a set of points would 
result in only very precise angles being penalized (exactly the 9,„, 
a set of measure zero), thereby yielding a uniform distribution 
and no memory effect. Instead, we note that the visual system has 
only finite spatial resolution, such that the representation of the 
history is blurred in space. We thus model the history as a sum of 
2-D Gaussians given by 



n(x) = ^ 



m = 1 



2na- 



exp 



(2) 



where o is the width of the Gaussians and x is the memory length 
(in time points, multiply by 2 ms for physical units); / M(x)dx = t 
since the Gaussians are normalized. We assume that x is finite and 
that the memory starts at the immediate previous point. Thus the 
step from Xj to Xj_|- 1 depends on points jm = {xj_ i , . . . , Xj_ t } ■ 

By converting Equation (2) to polar coordinates x = (r, 6) and 
y„ = (r^, Q„) centered on the current gaze location and integrat- 
ing over r, the number of points contained in an infinitesimal 
range of angle dQ is 



niQ) = ^J2 1 1 + V^J?m(e)e^'"('"'erfc [-«„(e)] ], 



(3) 



where -Rm(6) = r„, cos(6 — %„)/(ay/2), and erfc is the comple- 
mentary error function. Thus for the probability of choosing 
angle 6 we have 



exp[— aM(9)] 
/o^" exp[-an(e)] , 



(4) 



with m(6) given by Equation (3). 

For numerical simulation, walks are generated by iteratively 
drawing turn angles from the distribution p(6), updated at 
each time step (Af = 2 ms) to incorporate the updated history. 
Simulation lengths are given in figure captions 8 and 9. 

3.3. MODEL DYNAMICS 

The model has only three parameters: the number of points 
in the memory x (memory has finite length), the width of the 
blurred memory points a (memory has finite precision), and the 
strength of the penalty for moving in previously-visited direc- 
tions a. Parameters x and cr in particular have a simple intuitive 
meaning in describing the temporal and spatial properties of 
the memory representation. We explore the model's behavior by 
varying these parameters about the nominal set t = 20 ms (10 
time points), a = 1, and 0 = 1 px. 

Memory length x (Figure 8A) predominantly determines the 
dominant time scale in the scaling exponent H's temporal profile. 
The overall shape agrees well with the data for Af < 100 ms, with 
H rising to a peak then falling to a value consistent with normal 
diffusion. For t = 0 there is no memory and the dynamics are 
diffusive {H ^ 1). For small x > 0, increasing x increases the ini- 
tial H rapidly and yields a peak in H at non-zero Af. Increasing x 
increases the height of this peak H and shifts it to the right, while 






FIGURE 8 I Model scaling exponent parameter dependences. 

Simulations used curves averaged over 100 walks of 500 steps, 
around the nominal parameter set x=10 time points (20ms), a=l, 
and a = 1 px. (A) Varying memory length x; curves from bottom to top 
are for x = 0, 2, 5, 10. 15, 20, 25 time points, respectively, (B) Varying 



memory precision o; curves from bottom to top are for 



: -2, -1,6, -1,2, -0,8, -0,4, 0, 0,4, respectively, (C) Varying 



memory penalty strength a; curves from bottom to top are for 
logioa = -2, -4/3, -2/3, 0,2/3,4/3, 2, respectively Bottom row: 2-D 
parameter space a vs a with x=10, (D) Hi, (E) Hpeak. (F) Atpeak. 
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H at small Af increases only weakly. Comparison with Figure 8B 
suggests that a short memory of t = 5 points (10 ms) is consistent 
with the data. 

Memory width a (Figure 8B) predominantly determines the 
peak height in H's temporal profile. For very small cr, the dynam- 
ics are diffiisive because only very narrow angles are penalized. 
Increasing a for small to moderate values yields increases in H 
(both for small Af and at the peak), with only a weak effect on 
the peak's Af. The effect of varying a is thus approximately inde- 
pendent of the effect of changing x. Further increasing 0 to large 
values reduces H, leading again to normal diffusion with H ~ 1 
when the history is so broad that it extends even into the forward 
direction, penalizing all directions approximately evenly (upper 
regions of Figures 8D,E). 

The history penalty strength a (Figure 8C) controls the 
degree of superdiffiisivity in the dynamics. For small a the 
history only penalizes very heavily-visited directions, and 
hence the dynamics are approximately diffusive throughout. 
For large a, any visited directions are highly unlikely to be 




10 10 
At (ms) 



10 10 
At (ms) 



chosen. In particular, backward steps are unlikely, leading to 
approximately ballistic motion at short times {H ^ 2), tend- 
ing to normal diffusion at longer times. This is similar to 
a walk restricted to only a narrow range of turn angles, 
where many steps are needed before the dynamics can appear 
diffusive. 

Thus the three parameters each act on distinct features of the 
H curves. Peak exponent Hpeak is most strongly determined by cr, 
peak time fpeak is most strongly determined by r, and initial expo- 
nent Hi is most strongly determined by a, which also smoothly 
modulates the overall dynamics between purely normal diffusion 
and strongly ballistic short-time behavior. 

3.4. PARAMETER ESTIMATION FROM DATA 

As a proof of principle, we use a simple model inversion approach 
to infer parameters from individual data sets. Our intention is 
to demonstrate how this simple model can be fitted to data, 
an important initial step before implementing a formal model 
inversion scheme. The method involves first numerically gen- 
erating H curves across the 3-D parameter space, then using 
this set of curves as a lookup table to find the parameters that 
best fit the data-derived curves. Since x is discrete, the space 
is adequately spanned by a series of 2-D slices in which a and 
a vary at fixed x. Moreover, since Afpeak increases with x and 
our data all have Afpeak < 20 ms, only roughly twenty such 
slices are needed. One such slice is shown in Figures 8D-F. 
We sampled each 2-D slice with a 51 x 51 logarithmically- 
spaced grid in a and a spanning the region explored in 
Figure 8. Best fits were chosen as those with the minimum 
sum of least-squared errors over the range of short time scales 
Af < 20 ms. 

Figure 9 shows H curves for both the data and the model 
using estimated parameter values listed in Table 2. The H data 
and fitted curves agree well for short time scales, with resid- 
ual errors <5% for at least 20 ms in Af in all subjects. The 
good agreement between model and data extends beyond the 
fitted range in most subjects, up to at least 60 ms for three sub- 
jects (Panels A, D, and G). Three of the fits (Panels B, E, and 
F) agree well for the first 20-30 ms but do not have the same 
curvature; the dip in H appears to be too steep for the model. 
The dip in H is also made more pronounced by the data's 
trend toward H = I being interrupted by a second peak at 
Af = 100-200 ms. 



Table 2 | Estimated model parameter values corresponding to scaling 
exponents of the data in Figure 9. 



FIGURE 9 I Comparison data and model fits for all subjects. Parameters 
given in Table 2. IVIodel curves are obtained from average curves of 100 
walks of 500 steps. (A-H) Subjects 1-8, respectively, showing scaling 
exponent H data (blue, same as Figure 3A) and model fits (red). 
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4. DISCUSSION 

Fixational eye movements are an ever-present source of 
fluctuations in the visual input stream, continually perturbing 
retinal images during periods when information is extracted 
from the external environment. We have shown that FEM- 
induced fluctuations in the visual stream during natural vision 
are not simply additive measurement noise: they have a par- 
ticular correlation structure, distinguishing them from simple 
uncorrelated processes such as Brownian motion. Our analysis 
yields insights into possible underlying mechanisms and links 
FEMs in natural vision to the broad class of anomalous diffu- 
sion phenomena occurring in diverse complex systems including 
vestibulo-oculomotor neurons, posture control, particle trans- 
port, bacterial motion, and animal foraging (Anastasio, 1994; 
Collins and De Luca, 1994; Viswanathan et al., 1999; Metzler and 
Klafter, 2000; Codling et al, 2008). 

Studies of FEMs have mostly focused on data obtained with 
head restraints during forced fixation of static shapes, raising 
questions of the work's relevance to more ecologically-relevant 
activities (CoUewijn and Kowler, 2008). Microsaccades have 
now been detected in several conditions with varying degrees 
of increased realism, including head-unrestrained fixation tasks 
(Martinez-Conde et al., 2006) and free-viewing (Otero-Millan 
et al., 2013), head-restrained fbcation (Poletti and Rucci, 2010; 
Di Stasi et al, 2013) and free-viewing (Otero-Millan et al., 2008; 
Mergenthaler and Engbert, 2010; Otero-Millan et al, 2013) of 
static natural scenes, use of stimuli encompassing the entire visual 
field (Otero-Millan et al., 2013), fixation tasks with dynamic 
artificial stimuli (Laubrock et al., 2008), and in a dynamic sim- 
ulated driving environment (Benedetto et al, 2011). Here, we 
presented dynamic natural scenes that more closely approxi- 
mate real-world viewing conditions than do typical static stimuli. 
Our finding of correlated drift trajectories implies that these 
correlations are not merely responses to the constrained tasks 
and specialized featureless static stimuli used in many studies. 
Furthermore, because these correlations exist during viewing of 
dynamic stimuli, movement does not obviate the need for drifts. 
Motion is ubiquitous in the real world and thus many of the 
objects we fixate on will inevitably be moving targets, with tra- 
jectories that might be simple (cars driving past) or complicated 
(hand movements while someone talks) — visual input is ulti- 
mately a convolution of object motion with eye motion. The 
need to move beyond static images to dynamic and task-driven 
visual input is becoming widely accepted, particularly for model- 
ing gaze allocation (Tatler et al., 2011) and gaze strategies that 
evolve over time (Wang et al., 2012). Film stimuli beneficially 
enable long recordings in response to an approximately smooth 
and continuous visual stream. While directed films are particu- 
larly engaging for viewers (Hasson et al., 2004, 20 10), the presence 
of numerous cuts alters the gaze behavior relative to undirected 
videos (Dorr et al, 2010). Our use of Rope was motivated by 
it having relatively few cuts, thus being closer to natural con- 
ditions than typical films. Although static natural images could 
provide real-world scene statistics, building up robust statistics 
over a large ensemble of fixations would require presentation 
of many such images. This results in a long, discontinuous 
slideshow, characterized by recurring surprise then habituation. 



that is arguably less engaging and realistic than a single dynamic 
natural scene. 

The observed drift scaling properties are broadly consistent 
with those found during fixation tasks (Engbert and Kliegl, 2004; 
Mergenthaler and Engbert, 2007). This includes the subdiffusive 
confining effect at medium-length time scales, implying that sub- 
diffusion is not entirely due to the deliberate effort of lengthy 
precise fixation on a target. Nor do our findings attribute subd- 
iffusion to corrective microsaccades, in agreement with previous 
findings (Mergenthaler and Engbert, 2007). We found no evi- 
dence for a distinct high-velocity component in the gaze step dis- 
tributions and only a relatively low microsaccade rate consistent 
with free-viewing of static images (Mergenthaler and Engbert, 
2010), supporting the view that the importance of microsaccades 
is downplayed in natural unconstrained vision relative to fixation 
tasks (CoUewijn and Kowler, 2008). The peak in scaling exponent 
seen in all subjects near At = 100 ms may be due to the fact that 
after 3-5 video frames fixation targets have moved far enough 
to yield drifts larger than expected for a stationary target, or 
alternatively due to delayed oculomotor feedback with this char- 
acteristic time scale (Mergenthaler and Engbert, 2007); additional 
work is needed to disambiguate these possibilities. Involvement 
of the latter mechanism would also suggest a means for induc- 
ing history effects without a needing precise memory of past gaze 
positions. 

The fact that our observed superdiffusive behavior agrees 
with that seen in static fixation tasks (Engbert and Kliegl, 2004; 
Mergenthaler and Engbert, 2007) also shows that increased 
scaling exponents are simply not due to the pursuit trajec- 
tories in our data. Although the peak in turn angle dis- 
tributions at 0° is partly due to pursuits, this anisotropy 
alone does not explain the scaling behavior: surrogate walks 
with the same anisotropy do not yield strong superdiffusiv- 
ity. It would be ideal to cleanly distinguish "true" fixations 
and "true" pursuits, but the distinction is highly non-trivial 
for low target velocities. Our findings suggest that even 
despite the natural variability of target motion, the over- 
all statistics agree well with those found in typical fixation 
tasks. 

Our use of a head-mounted eye tracker was motivated by 
the greatly improved subject comfort levels and naturalistic con- 
ditions it allows relative to fixed-head systems. This enabled 
lengthy continuous acquisitions, ideally suited to studying eye 
movements in film viewing. While slow calibration drifts can 
hinder head-mounted systems, these are unlikely to affect eye 
movements occurring within the few hundred milliseconds of 
each fixation. That is, we focused on gaze trajectories measured 
relative to fixation onset, not absolute fixation positions, thus 
requiring high precision rather than high accuracy. Note that the 
EyeLink II system tracks eye movements in the moving refer- 
ence frame of the head, and tracks head position in three-space 
at 500 Hz, compensating for head rotations and translations. 
Together these measurements yield eye position in the fixed 
plane of the screen, thus the drifts remain even after account- 
ing for postural sway and vestibulo-ocular corrections. Although 
postural sway trajectories also exhibit a transition from persis- 
tent to antipersistent correlations (Collins and De Luca, 1994), 
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this occurs on much longer time scales than in FEMs (Engbert 
and Kliegl, 2004), further supporting an origin in the visual 
system. 

We showed that the potential confound of cross-talk between 
pupil size and position following sudden luminance changes 
(Kimmel et al, 2012) is not a significant factor for films with 
relatively slowly-varying luminance. The low noise level of video- 
based eye trackers (0.01° RMS) is advantageous for studies 
of small-amplitude FEMs, and the same model of eye tracker 
has been used to study microsaccades in a head-unrestrained 
setup (Martinez-Conde et al, 2006). Moreover, the spectral 
power of the machine noise has been shown using an arti- 
ficial pupil to be significantly smaller than that observed for 
a real eye (Wallis, 2006), and the near-flat spectrum cor- 
responds to a near-uncorrelated noise source which is thus 
unable to induce the observed correlation structure. Thus our 
results are robust against the noise in our video-based record- 
ings. Ultimately the main benefit of head-mounted video-based 
eye trackers is that they enable measurements under natural 
free-viewing conditions unencumbered by the onerous head- 
stabilization and/or physical interference with the eye required 
of techniques such as dual-Purkinje image systems and scleral 
coils. Although such systems are capable of lower noise lev- 
els, this is associated with such intrusions that limit ecological 
validity. We hope that drift scaling properties wiU be measured 
using these alternative techniques, perhaps shedding light on 
whether any discrepancies are due to having a contact lens in 
the eye and using bite-bar stabilization, versus the limitations 
of video-based pupil tracking, or varying levels of fatigue, for 
example. 

The model developed here is a suitable generative model for 
use in inversion schemes to estimate parameters from data. Our 
simple inversion method is a proof of principle for inferring 
physiological parameters from FEM recordings, and could pro- 
vide initial estimates for more sophisticated algorithms (e.g., 
variational Bayes). Indeed with a larger cohort and improved 
fitting methods we expect that the parameter estimates could 
be constrained more tightly than the present somewhat-broad 
ranges, yielding a sharper probe of the underlying physiology. 
Our inferred memory lengths in the range 6-32 ms are con- 
sistent with typical physiological time constants in premotor 
neurons of the oculomotor system (Cannon et al., 1983; Aksay 
et al., 2007), though time constants up to 100 ms have also 
been implicated (Seung, 1996; Seung et al, 2000). Systematic 
parameter inference across data sets also opens the possibil- 
ity of studying group differences between normal and clinical 
populations. 

The current model is particularly simple, but admits sev- 
eral straightforward future extensions that might redress the 
mismatch at long time lags, at the cost of additional parame- 
ters. Currently, the memory's influence on the dynamics begins 
immediately and ends abruptly after a fixed duration. These 
properties can be modified by turning the memory on only 
after a delay (to incorporate a latent period before the history 
is encoded), or for its influence to decay smoothly by setting 
a = a(f) with some characteristic decay rate. Such behavior 
could also be modeled by setting a = a{t) to be increasing in 



time, degrading memory precision through "diffusion" of the 
stored representation. Correlation of drift with the previous sac- 
cade direction could also be modeled by retaining some mem- 
ory of the previous saccade, rather than beginning anew each 
fixation. 

A recent model of FEMs has also incorporated self-avoidance 
as the key mechanism driving drifts observed in fixation tasks 
(Engbert et al, 2011; Engbert, 2012). The model encodes his- 
tory by treating space as a lattice and recording the num- 
ber of visits to each site, and the random walk proceeds by 
choosing the least-visited neighbor at each step. The model 
also includes a confining potential to keep the random walk 
near the origin, which is needed for the long-time subdif- 
fusive nature of fixation tasks, as well as a mechanism for 
triggering microsaccades when occupying highly-visited sites. 
These additional mechanisms are compatible with our model, 
although we have not included them for simplicity in the 
absence of compelling evidence for them in our data. For exam- 
ple, «(x) could be augmented with a radially-varying compo- 
nent to prevent long fixations from straying too far from the 
target. However, for natural vision, a static confining poten- 
tial should ideally be modified to account for fixation of 
moving targets, requiring a mechanism for predicting target 
trajectories. 

The self-avoiding mechanism is a natural way of avoiding 
adaptation. A feedback system that tends to avoid previously- 
encoded scene representations (e.g., involving VI) will tend to 
yield novel transient stimuli and thus stronger neural responses 
than for a static rapidly-adapted representation. Such behavior 
improves sampling of fine spatial detail (Donner and Hemila, 
2007; Otero-Millan et al, 2008); indeed even uncorrelated per- 
turbations to retinal images reduce redundancy and enhance 
feature extraction by removing low spatial frequencies (Kuang 
et al., 2012). Dependence of drift trajectories on scene statis- 
tics might be expected if FEMs act as an optimal search process. 
Superdiffusive behavior is associated with optimal search strate- 
gies (Viswanathan et al, 1999; Sims et al., 2008), and has been 
argued to explain saccadic scan paths (Brockmann and Geisel, 
2000). Memory of past states might alternatively be embodied 
independently of the visual scene at the level of brainstem pre- 
motor neurons (Seung, 1996; Seung et al., 2000), or by motor 
neurons in the superior coUiculus (Mergenthaler and Engbert, 
2007; Hafed et al., 2009; Engbert et al., 201 1), or in cortical areas 
(Pierrot-Deseilligny et al., 2004). All mechanisms that involve a 
memory of eye position must overcome challenges of biological 
noise. We argue that models will ultimately need to encode only 
imprecise history representations, similar to our implementation 
here. Integrated with brain imaging, the present approach has 
the potential to greatly inform the neural mechanisms underly- 
ing the accumulation of evidence about the statistical structure 
of natural scenes (Levy et al., 2004) and the temporal hierar- 
chies of human narratives (Hasson et al, 2008; Honey et al., 
2012). 
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